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CONFIDENCE  SETS  FOR  A  CHANGE-POINT 
David  Siegmund 


Summary. 

‘N. 

Several  methods  are  discussed  for  confidence  set  estimation  of  a  change-point  in  a  se¬ 
quence  of  independent  observations  from  completely  specified  distributions.  The  method  based 
on  the  likelihood  ratio  statistic  is  extended  to  the  case  of  independent  observations  from  a  one 
parameter  exponential  family.  Joint  confidence  sets  for  the  change-point  and  the  parameters 
of  the  exponential  family  are  also  considered.  . 


1.  Introduction. 

Let  Xi,X2 be  independent  random  variables  with  x\ ,  ...,x,  having  distribution 
F  and  Xj+ i,...,zm  having  distribution  G  ^  F.  The  change-point  j,  where  the  distribution 
shifts  from  F  to  G,  is  an  unknown  parameter,  to  be  estimated  by  a  confidence  set.  In  general, 
the  distributions  F  and  G  may  be  known,  completely  unknown,  or  specified  up  to  an  unknown 
parameter.  In  this  paper  I  discuss  several  procedures  for  the  artificial  but  informative  case 
of  completely  specified  F  and  G,  and  then  develop  more  completely  a  method  based  on  the 
likelihood  ratio  statistic  for  the  case  where  F  and  G  come  from  a  common  one  parameter 
exponential  family  of  distributions.  Precedent  for  the  approach  taken  here  is  found  in  Worsley 
(1986)  and  Siegmund  (1986). 

Section  2  is  concerned  with  known  F  and  G.  In  addition  it  is  assumed  that  the  sequence 

of  observations  is  actually  doubly  infinite,. ..  z_j,  xo,  ij, _  This  additional  assumption  has 

little  effect  if  m  is  large  and  it  is  known  that  j  is  not  close  to  1  nor  to  m,  because  observations 
far  from  the  change-point  carry  little  information  about  the  location  of  the  change-point. 
The  virtue  of  the  assumption  is  that  it  makes  j  into  a  location  parameter  and  provides  an 
exact  ancillary  statistic:  the  class  of  shift  invariant  events.  Five  confidence  set  estimates  are 
discussed.  Three  are  studied  by  Siegmund  (1986),  in  the  context  of  estimating  a  change-point 
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in  the  drift  of  Brownian  motion.  The  fourth  is  essentially  the  suggestion  of  Cobb  (1978), 
and  the  fifth  has  smallest  expected  size  among  all  shift  invariant  confidence  sets.  Section  3 
compares  the  different  confidence  sets. 

Sections  4  and  5  are  concerned  with  the  case  that  F  and  G  Me  imbedded  in  a  common  one 
parameter  exponential  family,  whose  parametesr  6  is  unknown.  Section  4  develops  a  method 
based  on  the  likelihood  ratio  statistic  for  obtaining  exact  confidence  sets  for  j,  A  new,  fairly 
simple  approximation  is  suggested  for  the  required  probability  calculation.  The  approximation 
is  illustrated  on  the  coal  mining  accident  data  along  the  lines  discussed  by  Worsley  (1986). 
Section  5  involves  the  special  case  of  normal  distributions  with  j  denoting  a  change  in  the 
mean.  The  likelihood  ratio  method  is  extended  to  give  a  joint  confidence  set  for  j  and  the 
difference  between  the  two  means. 

2.  The  Cases  of  Known  F  and  G. 

Let  Z  devote  the  integers  and  let  je  Z  .  Let  xn,  ne  Z  be  a  sequence  of  independent 
random  variables  with  x„  having  the  distribution  function  F  or  G  according  as  n  <  j  or 
n  >  j.  The  distributions  F  and  G  are  assumed  known;  the  change-point  j  is  unknown.  Let 
Pj  denote  the  probability  measure  induced  by  this  model  on  the  space  of  infinite  sequences 
w  —  (xn,  ntZ  ).  Let  a  denote  the  shift  operator,  i.e.,  the  mapping  which  takes  w  —  (xn,  neZ  ) 
into  ffcj  —  (xn+i,ne  Z ).  Note  that  the  family  {Pj,jeZ  }  is  a  translation  family  in  the  sense 
that  for  any  event  B  and  je  Z 


Pj{B)  =  Pj{ueB)  =  P0{c-’weB)  =  P0{a’B). 

Let  zn  =  log{dG(xn)/d.F(x„)}  denote  the  log  likelihood  ratio  of  x„,  and  put 

Sn  =  zi  +  ...  +  zn  (n  >  1) 

=  — (*7»+l  +  •  •  •  +  Zo)  [n  <  -1) 

=  0  (n  =  0) 


Let  £<  =  dPi/dPo  denote  the  likelihood  function  at  i.  By  considering  the  finite  sequence 
z„,  - N  <  n  <  N ,  and  then  letting  N  — ►  oo,  one  can  easily  show  that  —  exp(5,).  Under  Pq 


the  log  likelihood  process  ( Sn ,  neZ)  is  a  random  walk  satisfying  So  =  0  and  having  increments 
Sn  ~  Sn- 1  with  mean  /  log {dG / dF)dF  <  0  for  n  >  0  and  Jlog{dF/dG)dF  >  0  for  n  <  0. 

The  maximum  likelihood  estimator  for  j  is  the  value  j  where  the  process  (Sn,  ne  Z) 
assumes  its  maximum  value.  In  general  this  value  need  not  be  unique,  but  to  avoid  technicalities 
it  is  assumed  to  be  so  in  what  follows.  In  the  space  of  the  sufficient  statistic  (Sn,  ne  Z  ),  the 
sequence  Yi  =  Sj+<  —  S-- ,  it  Z  ,  is  ancillary. 

In  the  context  of  estimating  a  change-point  in  the  drift  of  a  Brownian  motion  process, 
Siegmund  (1986)  compares  the  following  three  confidence  sets  for  the  change-point  j.  The 
first  two  were  discussed  earlier  by  Hinkley  (1970,  1972),  who,  however,  made  no  attempt  to 
establish  their  relative  efficiency. 

(i)  Since  j  —  j  is  pivotal,  if  r  =  ra  is  defined  by  Po(|j|  >  r)  =  a,  then  Ci  =  [j  —  r,j  +  r] 
is  a  (1  -  a)  100%  confidence  interval. 

(ii)  Let  Aj  devote  the  acceptance  region  of  a  size  a  likelihood  ratio  test  of  the  hypothesis 
that  the  change-point  is  j,  i.e.,  Aj  =  (max„  Sn  -  Sj  <  rj},  where  rj  =  rja  satisfies  Pj(Aj)  = 
{Po(maxn>o  Sn  <  »7)}2  =  1  —  a.  Then  the  set  C-i  of  ne  Z  such  that  the  observed  sample  point 
wtAn  is  a  (1  -  a)  100%  confidence  set.  Since  the  log  likelihood  process  (Sn,  r.eZ  )  is  in  general 
multimodal,  this  confidence  set  is  not  in  general  an  interval. 

(iii)  A  modification  of  the  preceding  method  which  always  yields  an  interval  is  to  define 


L(R)  =  min(max)  jn  :  Sn  >  ma xS,  -  >7' j, 
which  for  suitable  »?'<»?  satisfies 


Pj(L  <  j  <  R)  =  P0{L  <  0  <  R)  =  1  -  2P0{R  <  0)  =  1  -  a. 

The  next  possibility  is  essentially  the  suggestion  of  Cobb  (1978).  In  analogy  with  Fisher’s 
(1934)  observation  that  the  conditional  probability  density  of  the  maximum  likelihood  estima¬ 
tor  of  a  location  parameter  given  the  sample  spacings,  which  are  ancillary  in  that  case,  is  the 
normalized  likelihood  function,  one  may  show  by  a  direct  calculation  that 
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fjtl  “  J  =  "!*<>»«  ^  )  =  -Po(j  =  n|r<, te  ^  )  =  exp  (Syo4<_n)  /  ]T  exp(S,)>  (1) 
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*•  a 
where  j0t,  denotes  the  observed  value  of  j.  Let 

Pn  =  exp(Sn)/^exp(S,),  ne  Z  .  (2) 

(iv)  It  follows  from  (1)  that  a  confidence  set  of  conditional  coverage  probability  1  —  a  can 

be  formed  as  follows.  Order  the  pn  in  (2)  as  p^j  >  p(2)  > _ Construct  the  set  C<  by  putting 

the  index  r»i  corresponding  to  p(i)  in  C4  and  continuing  to  add  points  n2, . . . ,  n*  corresponding 
to  P( 2)>  •  •  •  iP(Jt)  as  long  as  J2i<t  P(i)  <  1  —  Note  that  for  a  Bayesian  with  a  uniform  prior  on 
2Z, 

Pn  =  pr(i  =  n\Xi,ie2Z) 

and  hence  the  set  C4  is  a  highest  posterior  probability  credible  set  for  j.  In  fact,  even  without 
the  explicit  evaluation  in  (1),  one  knows  from  a  general  theorem  of  Stein  (1965)  and  Hora  and 
Buehler  (1966)  that  the  highhest  posterior  credible  set  for  j  is  also  a  confidence  set. 

(v)  One  can  also  obtain  an  unconditional  confidence  set  from  the  formal  posterior  prob¬ 
abilities  (pn,  ne  Z  )  in  (2)  as  follows:  let  c  be  such  that 

P){Pj  >  c}  =  Pq  |^exp(5n)  <  c'1 1  =  1  -  a,  (3) 

and  C5  =  {n  :  pn  >  c).  Then  C5  is  a  (l  -  a)  100%  confidence  set,  which  according  to  a  general 
theorem  of  Hooper  (1982)  or  alternatively  by  a  simple  Neyman-Pearson  argument  has  smallest 
expected  size  among  all  shift  equivariant  confidence  sets. 


Remarks.  The  confidence  sets  (ii),  (iv),  and  (v)  all  order  the  parameter  values  for  inclusion 
according  to  the  value  of  the  likelihood  function.  Where  they  disagree  is  where  to  draw  the 
line  between  inclusion  and  exclusion.  For  those  who  strongly  prefer  a  confidence  interval  to  a 
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possibly  disconnected  confidence  set,  (iii)  appears  to  be  a  reasonable  modification  of  (ii).  It  is 
possible  to  give  analogous  modifications  of  (iv)  and  (v). 


Of  these  five  confidence  sets,  all  except  for  (iv)  require  computation  of  a  sampling  distri¬ 
bution.  Approximations  are  suggested  in  the  following  section. 

3.  Comparisons. 


The  purpose  of  this  section  is  to  compare  the  expected  size  of  the  various  confidence  sets 
proposed  in  Section  2.  Since  the  case  of  known  G  and  F  in  artificially  simple  and  our  main  goal 
is  insight  into  the  case  where  G  and  F  contain  unknown  nuisance  parameters,  there  seems  to 
be  little  harm  in  simplifying  the  technical  problems  somewhat  by  assuming  that  F  is  N( 0, 1) 
and  G  is  N(S,  1)  for  a  known  5  >  0. 

Siegmund  (1986)  considers  the  computationally  simpler  case  of  a  Brownian  motion  process 
and  shows  that  the  length  of  the  confidence  interval  defined  in  (i)  is  substantially  longer  than 
the  expected  size  of  the  confidence  sets  in  (ii)  and  (iii). 

In  the  present  context  it  can  be  shown  as  a  — *  0  that  the  expected  sizes  of  the  confidence 
sets  in  (ii)  -  (v)  are  all  ~  4S~ 2  log  a-1,  whereas  the  length  of  the  interval  in  (i)  is  ~  8 S~2  log  a-1. 
Hence  the  confidence  interval  C\  defined  in  (i)  appears  not  to  be  competitive  with  the  others 
and  will  not  be  considered  further. 

Although  Siegmund’s  (1986)  comparison  of  (ii)  and  (iii)  favors  (ii),  the  difference  is  not 
large.  In  fact  there  is  a  transcription  error  in  passing  from  the  first  to  the  second  line  of  the 
display  following  (3.15)  of  Siegmund  (1986),  and  consequently  the  difference  in  the  numerical 
example  between  methods  (ii)  and  (iii)  is  smaller  than  stated  there.  Since  one  suspects  that 
the  rapid  fluctuations  of  Brownian  motion  may  account  for  some  of  that  difference,  and  since 
(iii)  is  the  only  remaining  interval  estimate  and  is  a  surrogate  for  interval  modifications  of  (iv) 
and  (v),  it  seems  reasonable  to  make  a  comparison  of  (ii)  and  (iii)  in  the  present  discrete  time 
setting.  Theorem  1  below  gives  asymptotic  expansions  as  a  — +  0  of  the  expected  size  of  the 
confidence  sets  (ii)  and  (iii). 

It  seems  difficult  to  give  comparably  precise  expansions  for  (iv)  and  (v).  Hence  (ii), 
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(iv),  and  (v)  are  compared  below  in  a  Monte  Carlo  experiment,  which  also  shows  that  the 
approximations  given  in  Theorem  1  are  reasonably  accurate. 

We  begin  with  approximations  for  the  coverage  probability  of  (ii)  and  (iii).  Let  $  be  the 
standard  normal  distribution  function  and 

t/(x)  =  2z~2  exp  |  -2  n~x$  ^-xy/n/2^  j  (x  >  0).  (4) 

For  computational  purposes  it  usually  suffices  to  use  the  small  x  approximation  (Siegmund, 
1985,  p.  219) 


i/(x)  =  exp  [~px)  +  o(x2)  (x  —  0),  (5) 

where  p  —  .583.  For  the  normally  distributed  x„,ne  Z  ,  under  consideration  here  Sn  = 
S[nS/2  -  S„),  n  =  0, 1, . . .,  where  Sn  =  x\  +  . . .  +  xn.  It  follows  from  a  classical  result  of 
Cramer  (cf.  Siegmund,  1985,  (8.49))  that 


P0 1  max5n  >  r\  ~  i/(6)  exp(-rj)  (r;  -►  oo) 
^„>o  J 

and  hence  by  (5)  for  Aj  defined  in  (ii)  above 


Pj(Aj)  “  {1  -  exp(— »7  -  p6} }2. 

By  conditioning  on  max„>o  Sn,  one  may  show  for  R  defined  in  (iii), 


Po(R  <  0)  =  Po  [  max  Sn  >  max  S„  +  t}' 

\  n<0  n>0 


~  v(8)  exp(-fj')Po|exp  max  | 


r/1  — *  oo.  It  is  possible  to  compute  the  expectation  on  the  right  hand  side  of  (8)  numerically  or 
give  a  small  S  expansion  analogous  to  (5),  but  for  our  purposes  it  seems  adequate  to  pretend 
that  (6)  is  an  equality,  which  leads  to  the  approximation 


P;(0  4[L,  i?])  “  2exp(-»7  -  />5){1  -  exp(-p£)/2}. 


(9) 


The  following  theorem  gives  an  asymptotic  expansion  as  a  — *  0  of  the  expected  size  of 
Ci  defined  in  (ii)  and  [L,R\  defined  in  (iii).  It  will  be  convenient  to  use  the  notation  [yj  = 
integer  part  of  y,  \C\  =  number  of  elements  in  the  set  C,  and  M  =  supn>0  Sn. 


Theorem  1.  Let  Ci  be  the  confidence  set  defined  in  (ii)  and  [L,  /?]  the  confidence  interval 
defined  in  (iii).  As  rj  — ♦  oo 

Ej\C2\  =  2[2tJ/S2\  +  4/Si 

-  4 5"1  [°°{2P0(M  >  x)  -  Pq[M  >  x)}dx  +  o(l), 

Jo 

and  as  r/'  — *  oo 


Ej{R-  L)=2[2r,'/62\  +4S2 

-  4£_1  f  f  Po(Medy){2Po{M  >  x  +  y)  -  Pq{M  >  x  +  y)}dx  +  o(l). 
Jo  Jo 


>0  Jo 

A  proof  is  sketched  in  an  appendix. 


To  obtain  easily  evaluated  approximations  to  the  integrals  appearing  in  these  expressions, 
one  may  again  pretend  that  (6)  is  an  equality  and  use  (5).  This  leads  to 


Ej\C2\  =  2[2r)/62\  +  26~2  (2  -  4e“'*  +  e~2f6) 


(10) 


and 


Ej{R  -  L)  S  2[2r)'/S2\  +25~2(2  -  4e ~pS  +  3e~2fiS  -  2e~3/,s/3 ).  (11) 

Table  1  contains  some  numerical  examples.  It  indicates  that  there  is  essentially  no  dif¬ 
ference  between  the  expected  size  of  the  confidence  sets  (ii)  and  (iii).  On  the  basis  of  these 


results  a  statistician  who  strongly  prefers  a  confidence  interval  to  the  generally  disconnected 
likelihood  ratio  confidence  set  should  feel  comfortable  in  imposing  that  constraint. 


Table  1. 

Expected  Size  of  Confidence  Sets  (ii)  and  (iii) 


a  6 


E0\C2\  (10) 


Eo(R-L)  (11) 


In  the  present  context  of  completely  specified  distributions  there  is  no  sampling  theory 
to  develop  in  order  to  use  the  confidence  set  (iv) .  However,  it  seems  a  difficult  problem  to  give 
a  reasonable  approximation  for  the  related  set  defined  in  (v).  A  crude  approximation  to  (3) 
which  might  be  used  as  the  first  step  in  an  iterative  numerical  or  Monte  Carlo  scheme  is  to 
replace  Sn  by  a  Brownie, a  motion  process  IV(t)  with  drift  -(S2/2)sgn(t)  and  variance  S2  and 
replace  the  sum  in  (3)  by  an  integral.  One  easily  sees  that  the  integral  over  [0,oo)  has  the 
distribution  given  by  Poliak  and  Siegmund  (1985,  Proposition  3).  This  can  be  convolved  with 
itself  to  obtain  prj/.^  exp{W(t)}«ft  <  c-1]  =  26~1y/c  exp(-4c/52)ifi(25-1%/c),  where  Kx  is 
the  modified  Bessel  function  of  the  second  kind. 

Table  2  reports  the  results  of  1000  repetition  Monte  Carlo  experiment  with  m  =  100 
and  j  =  50  to  compare  the  confidence  sets  Ci,Ci}  and  C$.  It  confirms  that  the  analytic 
approximation  for  the  expected  size  of  C 2  given  in  Theorem  1  is  reasonably  accurate  and 
shows  that  all  three  confidence  sets  have  about  the  same  expected  size. 


Table  2. 

Monte  Carlo  Comparison  of  C2,  C4,  and  C5 


a  (nominal) 

<?2 

^4 

C5 

8 

a 

Eo\C2\ 

a 

£o|C4| 

c 

a 

£o|C5| 

.10 

.07 

.090 

18.8 

.084 

19.5 

.010 

.092 

19.3 

.10 

1.0 

.098 

9.6 

.085 

10.3 

.022 

.113 

9.4 

.05 

0.7 

.041 

24.6 

.040 

25.2 

.005 

.047 

26.0 

.05 

1.0 

.048 

12.6 

.037 

13.2 

.011 

.052 

12.6 

Although  the  confidence  seta  defined  in  (ii)-(iv)  perform  similarly  on  the  average,  they  can 
treat  individual  sets  of  data  differently.  Figure  1  displays  two  simulated  log  likelihoods  with 
m  =  101 =  50,  and  8  ~  0.7.  The  horizontal  line  defines  the  95%  likelihood  ratio  confidence 
set  (ii).  In  accordance  with  the  approximation  (7)  it  is  drawn  3.27  units  below  the  maximum 
of  the  log  likelihood  function. 

In  the  "oper  part  of  Figure  1  the  one  major  peak  of  the  log  likelihood  is  fairly  sharp 
with  the  consequence  that  all  the  confidence  sets  are  about  one  half  their  expected  size  of  25. 
The  confidence  interval  defined  in  (iii)  has  one  point  less  on  each  end  than  the  likelihood  ratio 
confidence  set.  The  formal  Bayes  posterior  set,  C4,  makes  a  smaller  adaptation  to  the  peaked 
log  likelihood;  it  contains  four  more  points,  including  the  local  maximum  at  63.  The  confidence 
set  Cj  is  the  same  as  the  likelihood  ratio  confidence  set. 

The  lower  part  of  Figure  1  contains  a  comparatively  flat  log  likelihood  with  two  distinct 
peaks.  The  likelihood  ratio  confidence  set  contains  33  points.  The  interval  modification  is  now 
slightly  larger  because  it  contains  points  of  relatively  low  likelihood:  44,  45,  56-58.  Again  the 
formal  Bayes  posterior  set  adapts  less  to  the  departure  of  the  log  likelihood  from  its  expected 
shape  and  this  time  contains  four  fewer  points  than  the  likelihood  ratio  confidence  set. 
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In  general,  the  interval  modification  (iii)  is  usually  slightly  shorter  than  the  likelihood  ratio 
confidence  set  but  can  be  considerably  larger.  The  formal  Bayes  posterior  set  is  usually  larger 
than  the  likelihood  ratio  when  both  sets  are  small  and  smaller  when  both  sets  are  large.  This 
suggests  that  there  may  be  recognizable  subsets  making  the  conditional  coverage  probability 
of  the  likelihood  ratio  set  differ  from  its  nominal  value.  The  confidence  set  Cs  can  look  rather 
foolish  conditionally.  If  all  the  p,  are  very  small  and  about  equal,  it  can  deliver  a  small,  or 
perhaps  empty  confidence  set  while  the  other  methods  recognize  the  data  as  uninformative 
and  yield  large  confidence  sets.  Presumably  this  occurs  with  small  probability. 

Overall  the  evidence  given  here  does  not  seem  persuasive  for  choosing  among  the  confi¬ 
dence  sets  (ii)  -  (v).  A  possible  conclusion  is  that  in  more  complex  problems  one  may  reasonably 
use  whichever  method  seems  most  easily  adpated  to  the  problem  at  hand.  Whe  the  distribu¬ 
tions  F  and  G  are  unknown,  but  can  be  imbedded  in  a  common  exponential  family,  one  can  use 
a  conditioning  argument  to  obtain  exact  likelihood  ratio  confidence  sets.  This  is  the  subject 
of  the  next  section. 

4.  The  Likelihood  Ratio  Method  for  an  Exponential  Family. 

Now  suppose  that  F  and  G  can  be  imbedded  in  an  exponential  family  of  the  form 


dFg(x)  =  exp{0z  —  ip(9)}dFo(x ) 

relative  to  some  fixed  distribution  Fo,  which  without  loss  of  generality  can  be  standardized 
to  have  mean  0  and  variance  1.  Thus  for  some  unknown  90  ^  9\  and  jc{l, . . . ,  m},  xi, . . . ,  x, 
have  distribution  F/0  and  xJ+ 1, . .  .,xm  have  distribution  F^.  The  probability  on  the  space  of 
*i, ...» xm  will  be  denoted  by  pr,  with  the  dependence  on  j,  9q,  and  9\  suppressed. 

Several  writers,  e.g.,  Davies  (1977),  Siegmund  (1986),  and  Worsley  (1986),  have  observed 
that  one  can  extend  the  likelihood  ratio  method  (ii)  of  Section  2  to  obtain  a  confidence  set 
for  j  in  the  presence  of  the  unknown  nuisance  parameters  9o,9i  as  follows.  Let  H(x)  = 
sup#{0z  -  \p(9)},  Sn  =  xi  +  . . .  +  xn,  and 


A „,m  =  nH(n~1S„)  +  (m  -  n)f/{(m  -  n)-1(Sm  -  Sn)}. 


«*.  VA  •W'-Vv  -  '■ 


The  likelihood  ratio  test  of  the  hypothesis  that  the  change-point  is  j  has  acceptance  region  of 
the  form 


A.j  —  ^max  An, m  A jt7n  ^  k  j  . 

By  sufficiency  the  conditional  probability  of  Ay  given  (Sy,  Sm)  does  not  depend  on  &q,  0\.  Hence 
if  one  chooses  k  =  k(£i,  £2)  so  that 

Pt{Ay\Sy  =  £1,  Sm  =  fa)  =  1  ~  a 

for  all  $1,  £j,  then  the  set  of  values  j  which  are  accepted  by  the  test  is  a  (1  -  a)  100%  confidence 
set. 

It  is  not  actually  necessary  to  solve  for  A:(£i,  £2)  in  order  to  determine  the  confidence  set. 
Given  Sj  and  Sm,\yim  is  constant,  and  hence  the  confidence  set  is  most  easily  determined  as 
the  set  of  j  for  which 


<  1-a. 


pr<  maxAn,m  <  (max  A„iTO)ot,|5, 


An  approximation  for  this  conditional  probability  which  seems  adequate  for  many  cases  is 
given  below. 

Note  that  one  might  also  define  Ay  as  the  acceptance  region  of  the  likelihood  ratio  test  in 
the  conditional  model  given  Sm.  The  unconditional  test  is  often  simpler  analytically,  but  the 
conditional  one  may  turn  out  to  the  preferable.  In  the  simplest  case  of  a  normal  distribution 
with  mean  9  and  variance  1  there  is  no  difference  between  the  tests.  Similarly  one  could 
substitute  Pettitt’s  (1980)  test  with  acceptance  region  of  the  form 


Ay  =  <  max  |(nSm/m  -  Sn)\  -  \(jSm/m 


-Sy)|<aj. 


Given  (Sy,Sm)  the  random  variables  maxn<;  A„im  and  max;<n<m  An  m  are  conditionally 
independent,  and  hence  the  left  hand  side  of  (13)  is  of  the  form 


\rp 

s 


& 


i 


P*  —  “I5**5"*)  Pr  ^  <P^  ^n,m  ^  o|Sy,5m^  .  (15) 

These  two  probabilities  present  similar  computational  problems,  so  it  suffices  to  consider  the 
second  one,  or  equivalently 


pr  max  A„,m  >  a\Sj,Sm  .  (16) 

\  jvn<fw  J 

In  the  special  case  that  Ft  is  the  normal  distribution  with  mean  9  and  variance  1,  the 
probability  (16)  equals 

Pt  tg."  «/t)  >  -  S»  °  <}  ■  <17> 

for  which  Siegmund  (1986)  gives  an  approximation  under  the  assumptions  that  j,a,  and  $  are 
Jill  proportional  to  m,  and 


cl  =  2a-eV{j(l-i/m)}  (18) 

is  asymptotically  a  positive  multiple  of  m  as  m  — ►  oo.  A  somewhat  simpler  approximation  is 
obtained  by  assuming  that  c2  =  o(m).  For  ease  of  reference  we  record  the  result  as  Theorem 


Theorem  2.  Let  *i,. .  .,xm  be  independent  standard  normal  random  variables  and  Sn  = 
x\  ■+• . . .  +  xn.  Let  min(y,  m  -  j),  a,  and  £  be  proportional  to  m  as  m  — +  oo.  Suppose  c2  defined 
by  (18)  diverges  to  +oo  but  c2  =  o(m).  Then  the  probability  (17)  is 


~  Kf/O'C1  -  j/™)}  exp(-c2/2)  (19) 

as  m  — *  oo,  where  v  is  defined  in  (4)  and  given  approximately  by  (5). 

From  the  simulations  reported  in  Table  6  of  Siegmund  (1986)  one  can  see  that  (19) 
is  reasonably  accurate  for  the  range  of  j,  m,  and  {  considered  there.  Presumably  it  is  less 
accurate  for  larger  c  and/or  smaller  m,  but  it  seems  adequate  for  many  cases  of  interest. 
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According  to  the  approximations  (19)  and  (5)  the  confidence  set  defined  by  (13)  is  the 
set  of  all  t  such  that 


|l  -  exp (-.583(2 A, >m/ (»(1  -  «7m)}]1/2  -  (max  An>m  -  At-,m))  |  <  1  -  a.  (20) 

Even  when  one  questions  the  accuracy  of  (19)  or  when  the  data  are  not  normal,  the  central  limit 
theorem  suggests  the  use  of  (20)  as  a  first  approximation.  A  better  approximation,  simulation, 
or  numerical  methods  can  be  used  to  decide  whether  values  of «  on  the  boderline  according  to 
(20)  should  be  included  in  or  excluded  from  the  confidence  set. 

Note  also  the  formal  similarity  between  (19)  and  (6).  To  the  extent  that  {t(l  —  t'/m)}1/2  is 
nearly  constant  over  the  values  »'  of  interest,  e.g.,  when  the  likelihood  ratio  statistic  is  sharply 
peaked  and  hence  the  confidence  set  is  small,  (20)  shows  that  the  confidence  set  consists  of 
those  «'  for  which  A,im  is  within  some  distance  of  maxn  An>m,  which  can  be  displayed  graphically 
as  in  Section  2. 

Figure  2  shows  the  log  likelihood  ratio  statistic  and  the  approximate  cutoff  for  a  95% 
confidence  set  for  the  same  simulated  data  as  in  Figure  1.  Qualitatively  the  cases  of  known 
and  unknown  6  look  quite  similar.  Usually  the  confidence  set  is  larger  in  the  case  of  unknown 
S,  and  this  is  indeed  so  in  the  lower  plot.  However,  the  reverse  is  true  in  the  upper  plot, 
presumably  because  the  procedure  in  effect  estimates  6  and  then  acts  as  if  the,  in  this  case 
large,  estimated  value  is  the  true  one. 

Returning  to  the  general  exponential  family,  if  we  let  a  =  mao  and  condition  on  Sm  = 
6  =  "J&o,  we  see  from  (12)  that  (maxj<n<m  A„,m  >  a)  =  (_C=/  -K5"  >  rnb2{n/m)}  U  {Sn  < 
m&i(n/m)}], 

where  6j(t)  <  b2(t),0  <  t  <  1,  are  the  solutions  of 


tH{t~lbi(t)}  +  (1  -  t)ff  [(1  -  0-1{& o  *  6.(0)]  =  ao. 


(21) 


Usually  one  is  interested  in  evaluating  (16)  in  cases  where  S;  =  £1  is  fairly  close  to  one  of  the 
boundary  curves  mbi(j/m)  or  m&2(j/m).  Thus  the  probability  of  crossing  the  other  can  be 
neglected,  and  it  seems  reasonable  to  develop  an  approximation  in  which  the  distance  from 
to  the  relevant  curve  is  small  in  some  sense.  See  Figure  3.  Our  problem  reduces  to  approximate 
evaluation  of  probabilities  like 


pr[SJ+j  >  mbi{(j  +  i)/m}  for  some  *  <  m  -  j\Sj  =  £i,Sm  =  &].  (22) 

The  mathematically  convenient  interpretation  of  the  condition  that  mbz(j/m )  -  be 
small  is  that  it  be  0(v''m). 

Siegmund  (1985,1986)  develops  a  method  for  approximating  boundary  crossing  proba¬ 
bilities  which  can  be  adapted  to  the  present  context.  A  suitable  result  is  given  in  Appendix 
B. 

As  an  illustration  we  consider  the  British  coal  mining  accident  data  of  Maguire,  Pearson, 
and  Wynn  (1952),  as  extended  and  corrected  by  Jarrett  (1979).  Worsley  (1986)  has  analysed 
the  original  data  and  determined  the  likelihood  ratio  confidence  set  by  numerical  computation 
of  (15). 

The  data  are  intervals  in  days  between  accidents  in  British  coal  mines  in  which  at  least 
ten  deaths  occurred.  Jarrett’s  (1979)  data  involve  m  =  190  intervals  from  15  March,  1851  to 
22  March,  1962,  a  period  of  40,549  days.  Under  the  assumption  that  the  intervals  yi, . .  .,ym 
are  independent  and  exponentially  distributed  with  a  change  after  the  j-th  observation  in  the 
mean  time  between  accidents,  we  shall  determine  a  likelihood  ratio  confidence  set  for  j . 

The  likelihood  ratio  statistic  is  maXnAr,,^  =  max„[mlog(Wm/m)  -  nlog(W„/n)  -  (m  - 
n)  log{(Wm  -  Wn)/{m  -  n)}],  where  Wn  =  yi  4- . . .  +  y„.  For  Jarrett’s  data  the  maximum  value 
equals  35.6  and  is  assumed  at  n  =  124  in  the  year  1890.  the  approximation  (20)  gives  the  set 
{116,117,.  . . ,  128, 133}  as  a  95%  confidence  set  for  the  change-point.  This  corresponds  to  the 
nterval  from  1887  to  1893  together  with  an  isolated  ppoint  in  1897.  One  may  want  to  use  the 
presumably  more  accurate  probability  approximation  given  in  Theorem  3  in  Appendix  B  to 


check  some  of  the  borderline  cases. 
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Figure  3.  Conditional  Boundary  Crossing  Problem. 
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For  example,  for  j  =  129,  Amj  =  32.4,  so  (20)  yields  .961,  and  129  is  not  included  in  the 
confidence  set.  From  (21)  with  a  =  mao  =  35.6  one  easily  calculates  the  ingredients  to  apply 
Theorem  3  in  Appendix  B  and  obtains  as  an  approximation  to  (15)  (1  -  .011)(1  -  .024)  = 
.965,  which  confirms  that  129  should  be  excluded  from  the  confidence  set.  Note  that  this 
approximation  and  the  normal  approximation,  (20),  are  reasonably  consistent,  although  the 
normal  approximation  involves  two  equal  factors  while  this  one  contains  two  unequal  factors, 
one  smaller  and  one  larger  than  in  the  normal  approximation.  For  j  =  128  the  approximation 
by  means  of  Theorem  3  for  the  second  factor  in  (15)  is  1  -  .057  =  .943,  and  hence  128  is 
included  in  the  confidence  set  regardless  of  the  value  of  the  first  factor.  After  examining  two  or 
three  values  of »,  one  quickly  concludes  that  the  approximation  of  Theorem  3  yields  the  same 
confidence  set  as  the  crude  normal  approximation. 

Application  of  (20)  to  the  original  Maguire,  Pearson,  and  Wynn  (1952)  data  gives  pre¬ 
cisely  the  same  confidence  set  which  Worsley  computed  numerically.  However,  because  of 
discrepancies  between  the  two  data  sets,  the  years  covered  by  the  two  coonfidence  sets  are 
slightly  different. 

Raferty  and  Akman  (1986)  give  a  flat  prior  Bayesian  analysis  of  these  data.  It  appears 
from  their  calculations  and  Figure  that  a  highest  posterior  set  estimate  for  the  change-point  is 
the  same  as  the  confidence  set  computed  here.  Presumable  such  a  postesrior  set  is,  under  some 
general  conditions,  approximately  a  confidence  set  for  large  m,  but  the  elegant  exact  relation  of 
(l)  and  (2)  is  no  longer  valid.  It  would  be  interesting  to  give  some  precise  asymptotic  results, 
which  would  serve  to  extend  the  method  (iv)  in  Section  2  to  the  case  of  unknown  nuisance 
parameters. 

Cobb  (1978)  has  suggested  an  alternative  extension  of  method  (iv)  to  deal  with  nuisance 
parameters,  but  it  contains  some  arbitrary  features  which  may  make  it  difficult  to  implement 
with  small  or  moderate  sample  sizes. 

5.  Joint  Confidence  Sets. 

The  likelihood  ratio  method  can  also  be  adapted  to  give  joint  confidence  sets  for  the 
change-point  j  and  some  function  6  of  the  parameters  8q  and  6\ .  In  this  section  we  consider 


the  simple  case  of  normally  distributed  Xi  having  mean  9q  or  9\  according  as  1  <  »  <  j  or 
j  <  i  <  m  and  variance  one,  and  take  8  =  9\  —  9q. 

The  acceptance  region  of  the  likelihood  ratio  test  of  the  hypothesis  that  the  parameters 


are  j  and  8  is 


AjiS  =  [sup  A ,,m  -  8{jSm/m  -  Sj  -  j(  1  -  j/m)/2 }  <  c2/ 2], 


where  A<  m  =  ( iSm/m  —  S,)2/{2i(l  —  t’/m)}  and  c  =  c(j,8)  is  chosen  to  satisfy 


PriAi,s)  =  l~a 


for  all  j  and  8.  Note  that 


sup  A ,,m  -  8{jSm/m  -  Sj  ~  2  l8j(l  -  j/m)} 
.....a,  _  k  .  ,  {jSm/m-Sj  -  8 


=  sup  Ai,m  -  Ajtm  + 


2j(l  -j/m) 


and  since  the  first  difference  on  the  right  hand  side  is  necessasrily  non-negative,  one  obtains 

Pr(Ay,*)  =  pr[|iSm/m  -  Sj  -  8j{  1  -  j/m) \  >  c{/(l  -  j/m)}1/2] 
+E[pT(Acj'S\jSm/rn  -  Sj ); \jSm/m  -  Sj  -  5/(1  -  //m)|  <  c{j(l  -  j/m)}1/2 ].  (24) 

The  first  term  on  the  right  hand  side  of  (24)  is  exactly  2{1  -  0(c)}.  According  to  Theorem 

2 

pr{Aej,s\jSm/rn  -  Sj  =  0 

~  m«/w  -  //-)>i  -p  [-*/«  + 

provided  the  exponent  diverges  more  slowly  than  m  as  m  — ♦  oo.  Substitution  of  this  approxi¬ 
mation  into  (24)  yields 

pr {ACjS)  =*  2(1  -  0(c)}  +  2<p(c)  f  i/[6  +  z/{j{  1  -  j/m)}1/2]dz 

J -c 

—  2(1  —  0(c)}  +  4i/(8)c<p(c),  (25) 


if  we  assume  that  c/{j(  1  —  j / m)}1/2  is  small.  Here  tp  and  $  are  the  standard  normal  density 
and  distribution  function  respectively. 

Using  (25)  one  can  easily  find  an  approximate  confidence  set  by  trial  and  error.  Given  t, 
one  sets  8  equal  to  the  estimator  8{  =  ( iSm/m  -  S,)/{t(l  —  t‘/m)}  and  finds  the  value  of  c  for 
which  (25)  equals  a.  Thus  by  (23)  one  finds  whether  that  i  and  some  6  are  in  the  confidence 
set.  Then  one  iteratively  finds  upper  and  lower  bounds  on  8  for  that  particular  value  of  i.  In 
principle  this  must  be  repeated  for  each  «. 

An  extension  of  this  method  to  non-normal  exponential  families  requires  consideration  of 
special  cases.  The  generalization  of  (23),  in  almost  obvious  notation,  is 

supA,,m  -  A j'„(8) 

» 

—  ^aup  A,  m  —  A j,fn  \  ■+•  { AJ>m  —  Aj  m(i5)). 

If  8  is  a  function  of  the  difference  in  the  natural  parameters  of  the  exponential  family,  e.g.  if  the 
parent  populations  are  Poisson  and  8  is  the  ratio  of  their  means,  by  computing  probabilities 
conditionally  given  Sm>  one  obtains  a  statistic  whose  distribution  is  parameterized  by  6.  On 
the  other  hand,  if  the  parent  distributions  are  exponential  and  8  is  again  the  ratio  of  their 
means,  because  of  invariance  of  the  two  sample  problem  under  common  changes  of  scale,  the 
unconditional  model  and  unconditional  distribution  of  A;,m  —  A j,m[8)  are  appropriate. 

The  sampling  theory  seems  rather  complicated.  Presumably  a  normal  approximation 
using  (25)  suffices  when  m  is  large  enough,  but  this  needs  to  be  investigated. 
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APPENDIX  A 

Informal  Proof  of  Theorem  1. 

We  consider  only  the  confidence  interval  [L,  JZ],  The  proof  for  the  likelihood  ratio  confi¬ 
dence  set  is  similar  and  somewhat  simpler.  Since  the  confidence  set  is  equivariant,  it  suffices 
to  consider  the  case  j  =  0.  To  simplify  the  notation  we  shall  write  pr  and  E  instead  of  P0  and 
£o.f?  instead  of  ri',Sn  instead  of  Sn,  and  take  5  =  1.  Recall  that  M  =  supn>0Sn. 

For  arbitrary  no  =  1, 2, . . . 

oo  oo 

E{R-  L)  =  pr  (L  <n<  R)  =  pr  (L  <  0  <  R)  +  2  53pr(£  <  n  <  R) 

—  OO  1 

OO 

=  pr(L  <  0  <  R)  +  2  53{pr(fi  ^  n)  ~  Pr(L  >  n)} 

l 

OO 

=  1  +  253pf(^  ^  n)  +  °(1)  as  r\  -*  oo 


=  1  +  2no  +  2  53  Pr(R  ^  n)  -  2  53^  "  pr(^  -  n)}  +  °(1)- 

no+1  1 


(Al) 


For  positive  n,  by  the  definition  of  R 


pr (R  >  n)  =  pr  I  sup  S,  <  sup  S,  +  rj 
V  i<n  i>ft  i 


=  f  f  pr  (  Sned£,  max(5,  -  Sn)«dy  \ 

J  J  (—^0) x[0,oo)  l  ‘^n  > 

X  pr  f  max  S*  <  r)  +  £  +  y|5„  =  £  )  pr  (  max  5<  <  n  +  £  +  y ) 

\0<i<n  /  \  1<0  / 

=  [  f  pr(Sned£)pr(Medy)pr  (  max  Si  <  rj  +  £  +  y|5„  =  £  ]  pr( 

J  J  [-^,0)x[0,oo)  \0<»<n  / 

=  f  f  pr (Sne  -  V  +  dz)pr(Medy)pr  (  max  5<  <  x  +  y| Sn  =  x  -  rj  J  pr(M  <  x  +  y). 

J  J  (0,oo)x(0,oo)  \0<»<n  / 


r(M  <  r)  +  £  +  y) 


Let  no  =  [2»7 J  and  k  =  n  -  no,  so 
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pr(5ne  -  r,  +  dx)  =  <p  |  I  +  ^+y/2  }  (no  +  k^dx. 


It  may  be  shown  that  the  contribution  to  the  two  series  in  (Al)  from  values  of  x  and  k  outside 
the  range  |k|  <  r )2!3,  \x  4-  kj 2|  <  q2/3  is  negligible,  and  inside  this  range 


pr  (  max  S<  >  i  +  y|S„0+*  =  -q  +  x)  -  pr (M  >  x  +  y) 

\0<«<no+fc  / 

converges  uniformly  to  0.  Hence  for  the  purpose  of  evaluating  (Al)  asymptotically,  pr(JR  > 
no  +  k)  may  be  replaced  by 


Jo  Jo  ^  (no  +  fc)  1/2dx  pr{M(dy){l  -2pr(M  >  x  +  y)+pr2(Af  >  x  +  y)}. 

For  k  =  0  this  integral  converges  to  1/2.  The  terms  in  (Al)  for  k  =  ±1,  ±2, . . .  may  be  paired, 
and  after  some  calculation  one  obtains 


>  n0  +  k)  -  ^{l  -  pr(i2  >  n0  +  fc) }  =  -1/2 
*>1  k<0 

+  ^[${2"1ifc/(no  -  k)1'2}  -  ${2 ~lk/[no  +  k)1'2} 

*>i 

-  2n”1/V(2_1/r,o/2)  f  f  pr(Medy){2pr(M  >  i  +  y) 

Jo  Jo 

-  pr 2(M  >  x  +  y)}dz  +  o(l). 


A  Taylor  series  expansion,  approximation  of  Riemann  sums  by  integrals,  and  substitution  of 
the  result  back  into  (Al)  complete  the  informal  proof  of  Theorem  1. 
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APPENDIX  B 


Boundary  Crossing  Probabilities. 

This  appendix  is  concerned  with  approximations  to  boundary  crossing  probabilities  like 
(22).  The  notation  used  here  is  independent  of  the  body  of  the  paper. 

Let  ij,  X2, . . .  be  independent  random  variables  with  distribution  function  of  the  form 

dFf(x)  =  exp {9x  -  tp(9)}dFo(x)  (A2) 

with  Fo  standardized  to  have  mean  0  and  variance  1.  Let  S„  =  Xi  +  . . .  +  xn.  To  emphasize 
dependence  of  probabilities  and  expectations  on  0  we  write  prs  and  Eg. 

Let  60  >  0, 6(0)  =  0,  and  define 

T  =  inf{n  :  S„  >  6o  +  m6(n/m)}. 

We  seek  approximations  as  m  — *  oo  for 

pr0(T  <  molS^  =  m0^0), 

where  m0  =  mt0  for  some  fixed  to  >  0.  We  assume  that  £o  <  b(to)/to  and  £o  <  6(t)/t  uniformly 
on  compact  subsets  of  [0,to).  Let  ci  =  6'(0),C2  =  6"(0)/2.  Note  that  £o  <  ci  and  locally  near 
0 

m6(n/m)  =  cin  +  cjn2/m  +  0(n3/m2).  (A3) 

Define  6q  by  t/>'(0o)  =  £o  and  9i  >  90  by 

ip{9i)  -  ip{9o)  =  ci(^i  -  f0).  (A4) 

Let  =  V>"(0.)  (»  =  0, 1),  and  A  =  9X  -  90. 

Let  t+(r})(t-(rj))  =  inf{n  :  Sn  -  rjn  >  (<)0),  and  put 


r  ip.  /■ ^  ^ 


c.'i 


j.  r  r  jij'.ij ■  ".ij  .1  i 


V+  =  pr*0(Mci)  =  °°}Pr«,{Mci)  =  +oo}/{A(£i  -  ci)}.  (A5) 


Theorem  3.  Assume  that  for  all  9,  for  all  sufficiently  large  n  the  n-fold  convolution  of  F$ 
has  an  integrable  characteristic  function.  Suppose  bo  — 1 -  oo  as  m  — «  oo  and  bo  =  0(y/m).  Then 

pr0(T  <  m0|5mo  =  m0£ o) 

~  v+  exp[— A60  -  m_162(£i  -  c1)_2{(^1  -  £0)2/(2<^)  +  Ac2}].  (A6) 

In  order  to  evaluate  the  constant  i/+  defined  in  (A5)  it  may  be  helpful  to  use  the  local 
expansion  for  small  ci  and  9q 


v+  =  exp(-Ap+)  +  o(A2),  (A7) 

where  p+  =  EoS^+^/{2ESt+^).  See  Siegmund  (1985,  Chapter  X)  for  a  justification  and 
method  for  computing  p+  numerically.  In  the  normal  case  this  is  the  approximation  (5).  In 
the  exponential  case  the  standardized  generating  distribution  dFo(x)  equals  either  exp(-x  + 
1  )dx  (x  >  -1)  or  exp(x  -  l)dx  (x  <  l),  and  the  approximation  (A7)  is  exp(-A)  or  exp(- A/3) 
respectively.  These  approximations  were  used  in  the  numerical  example  in  Section  4. 

If  bo  =  o(v/m),  the  right  hand  side  of  (A6)  is  just  Cramer’s  classical  approximation  to  the 
probability  of  ultimate  ruin,  as  follows.  Suppose  b(t)  =  c\,t,  and  consider  pr*0(T  <  oo).  By 
(A2)  and  (A4)  the  likelihood  ratio  of  xi, . . xn  under  9 o  relative  to  9\  is  exp{-A(Sn  -  cin)}. 
Hence  by  Wald’s  likelihood  ratio  identity 

prSo(T  <  oo)  =  exp(-A6o)£o,  exp{-A(Sr  -  60  -  ^T)},  (A8) 

and  by  the  renewal  theorem 

lim  ESi  exp{-A(Sr  -  b0  -  c^T)}  = 

bo—oo 
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where  v+  is  defined  in  (A5).  See,  for  example,  Siegmund  (1985,  Chapter  VIII)  for  details. 
Hence  (A6)  has  the  interpretation  that  if  &o  =  o(v/m),  asymptotically  the  curvature  of  the 
boundary  plays  no  role  and  the  conditional  probability  given  Smo  =  mofo  is  effectively  the 
unconditional  probability  prSo  having  the  drift  £o  =  4>'{0q). 

In  general  the  first  term  multiplying  on  the  right  hand  side  of  (A6)  corrects  for 

the  fact  that  we  have  a  conditional,  not  an  unconditional  probability  and  the  second  corrects 
for  the  curvature  in  &(().  In  fact,  by  (A3)  one  can  modify  (A8)  to  read 

exp(A60)pr#0(T  <  m0) 

=  ^,(exp[-A{5r  -  60  -  mb(T/m )}  -  Ac2T2/m  +  0(T3/m2)];  T  <  mo). 

It  is  easy  to  see  that  6q  1T  — >  (£1  —  ci)-1  in  pr^-probability,  and  by  Theorem  9.45  of  Siegmund 
(1985)  the  limiting  distribution  of  the  excess  over  the  boundary,  St  -  &o  -  mb(T/m),  is  the 
same  as  in  the  linear  case.  This  explains  the  correction  for  non-linearity.  The  correction  to 
account  for  the  conditional  probability  is  obtained  by  a  modification  of  the  proof  of  Theorem 
8.72  of  Siegmund  (1985).  The  details  are  much  more  complicated  and  have  been  omitted. 

To  illustrate  the  approximation  (A6)  suppose  that  yi,...,ym  are  independent  standard 
exponential,  Wn  =  yi  +  . . .  +  yn,  and  consider  Pettitt’s  test  with  acceptance  region  (14),  or  to 
emphasize  the  invariance  of  the  exponential  scale  parameter 

j|  ma x(nWm/m  -  Wn)/(Wm/m) \  <  aj  . 

From  the  theorem  one  obtains  the  following  approximations: 

pr  <  max  (i  -  fVJ+i/xo)  >  a-  j\Wj  =  jy0,  Wm  =  mi0 

0<i<m-j 

5?  exp[-A(60  +  1/3)  -  2-lfr2A2/{(m  -  j')#2}], 

where  b0  =  a  -  j(l  -  yo/x0),  £0  =  ~j(  1  ~  yo/xo)/{m  -  j),  90  =  £0/(l  -  6)  <  0,  and  >  0 
satisfies 


9 1  —  log(l  +  9 1)  =  0o  -  log(l  +  0O); 

pr  (  max  (»  -  W{/x 0)  >  a\Wj  =  jy0,Wm  =  mi0)  =  exp{-A(60  +  1)  -  2_16oA2/C?0i)}> 
where  bo  =  a  -  j(  1  -  yo/x0),  &  =  -(1  -  yo/xo),  90  =  &>/(!  +  to)  <  0,  and  9i  >  0  satisfies 

9\  +  log(l  —  9{]  =  0o  +  log(l  —  6q ). 

The  local  expansion  (A7)  for  v+  is  used  in  both  these  approximations.  The  analogous  approxi¬ 
mations  for  the  likelihod  ratio  test  are  similar  but  slightly  more  complicated  since  they  involve 
both  the  first  and  second  derivatives  of  the  boundary  curve  h{t)  at  t  —  jm,  which  are  easily 
obtained  from  (21). 

If  we  invert  the  Pettitt  test  to  obtain  a  95%  confidence  set  for  a  change-point  in  the  coal 
mining  accident  data,  we  find  the  same  confidence  set  as  in  Section  4,  with  one  exception.  The 
attained  significance  level  of  j  =  129  is  .039  +  .014  >  .05,  so  j  =  129,  corresponding  to  the 
year  1894  is  included  in  the  confidence  set. 
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